Data Quality for Temporal Streams

نویسندگان

  • Tamraparni Dasu
  • Rong Duan
  • Divesh Srivastava
چکیده

Temporal data pose unique data quality challenges due to the presence of autocorrelations, trends, seasonality, and gaps in the data. Data streams are a special case of temporal data where velocity, volume and variety present additional layers of complexity in measuring the veracity of the data. In this paper, we discuss a general, widely applicable framework for data quality measurement of streams in a dynamic environment that takes into account the evolving nature of streams. We classify data quality anomalies using four types of constraints, identify violations that could be potential data glitches, and use statistical distortion as a metric for measuring data quality in a near real-time fashion. We illustrate our framework using commercially available streams of NYSE stock prices consisting of aggregates of prices and trading volumes collected every minute over a one year period from November 2011 to November 2012.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TRACDS: Temporal Relationship Among Clusters for Data Streams

In this paper we propose a new extension to clustering data streams based on the Temporal Relationship Among Clusters for Data Streams (TRACDS). This is not a new clustering algorithm, but rather a way to capture the temporal relationships among clusters that is inherent in the ordering of observations in the data stream. We propose to capture this ordering relationship among the clusters by ov...

متن کامل

Impact of sampling techniques on measured stormwater quality data for small streams.

Science-based sampling methodologies are needed to enhance water quality characterization for setting appropriate water quality standards, developing Total Maximum Daily Loads, and managing nonpoint source pollution. Storm event sampling, which is vital for adequate assessment of water quality in small (wadeable) streams, is typically conducted by manual grab or integrated sampling or with an a...

متن کامل

Estimating Data Stream Quality for Object-Detection Applications

Object-detection applications rely on streams of data gathered from sensors, RFID readers, and image recognition systems, among others. These raw data streams tend to be noisy, including both false positives (erroneous readings) and false negatives (missed readings). Techniques exist for general-purpose cleaning of these types of data streams, based on temporal and/or spatial correlations, as w...

متن کامل

Spatial and Temporal Evaluation of Water Quality in the Kashkan River

The Kashkan River basin is one of the most important watersheds in the west of Iran, where major urban, agricultural and livestock regions are located in its catchment area. The aim of the study reported here is to evaluate the spatial and long temporal variations of surface water quality in the Kashkan River by using Water Quality Index, which aggregates different parameters and their dimensio...

متن کامل

On Clustering Massive Data Streams: A Summarization Paradigm

In recent years, data streams have become ubiquitous because of the large number of applications which generate huge volumes of data in an automated way. Many existing data mining methods cannot be applied directly on data streams because of the fact that the data needs to be mined in one pass. Furthermore, data streams show a considerable amount of temporal locality because of which a direct a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Data Eng. Bull.

دوره 39  شماره 

صفحات  -

تاریخ انتشار 2016